🌟 DeepCoder-14BNew code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed RL with GRPO+ and iterative context lengthening. Trained on ~24K coding problems (TACO-Verified

🌟

DeepCoder-14B

New code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed RL with GRPO+ and iterative context lengthening. Trained on ~24K coding problems (TACO-Verified, PrimeIntellect SYNTHETIC-1, LCB v5), it improves Pass@1 on LiveCodeBench v5 to 60.6%, +7.6% over base and on par with OpenAI o3-mini.

- GRPO+: removes KL/entropy loss for stability; adds offline difficulty filtering, DAPO-inspired loss masking, and reward clipping.
- Iterative context scaling: 16K→32K→64K generalization with improved long-context reasoning.

Eval: Strong results on LiveCodeBench, Codeforces, HumanEval+

Open weights🔥

https://huggingface.co/agentica-org/DeepCoder-14B-Preview

@opendatascience

Please open Telegram to view this post

VIEW IN TELEGRAM

www.tg-me.com/kr/Data Science by ODS ai 🦜/com.opendatascience/2251

3.1K viewsApr 9 at 12:31

tg-me.com/opendatascience/2251

Create: 2025-04-09
Last Update: 2025-06-01 19:54:04

🌟 DeepCoder-14B

New code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed RL with GRPO+ and iterative context lengthening. Trained on ~24K coding problems (TACO-Verified, PrimeIntellect SYNTHETIC-1, LCB v5), it improves Pass@1 on LiveCodeBench v5 to 60.6%, +7.6% over base and on par with OpenAI o3-mini.

- GRPO+: removes KL/entropy loss for stability; adds offline difficulty filtering, DAPO-inspired loss masking, and reward clipping.
- Iterative context scaling: 16K→32K→64K generalization with improved long-context reasoning.

Eval: Strong results on LiveCodeBench, Codeforces, HumanEval+

Open weights🔥

https://huggingface.co/agentica-org/DeepCoder-14B-Preview

@opendatascience

Data Science by ODS ai 🦜 Telegram | DID YOU KNOW?

Telegram announces Anonymous Admins

🌟 DeepCoder-14BNew code reasoning LLM fine-tuned from DeepSeek-R1-Distill-Qwen-14B using distributed RL with GRPO+ and iterative context lengthening. Trained on ~24K coding problems (TACO-Verified